Search CORE

10 research outputs found

Recommended from our members

Electronic Health Record Summarization over Heterogeneous and Irregularly Sampled Clinical Data

Author: Pivovarov Rimma
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

The increasing adoption of electronic health records (EHRs) has led to an unprecedented amount of patient health information stored in an electronic format. The ability to comb through this information is imperative, both for patient care and computational modeling. Creating a system to minimize unnecessary EHR data, automatically distill longitudinal patient information, and highlight salient parts of a patient’s record is currently an unmet need. However, summarization of EHR data is not a trivial task, as there exist many challenges with reasoning over this data. EHR data elements are most often obtained at irregular intervals as patients are more likely to receive medical care when they are ill, than when they are healthy. The presence of narrative documentation adds another layer of complexity as the notes are riddled with over-sampled text, often caused by the frequent copy-and-pasting during the documentation process. This dissertation synthesizes a set of challenges for automated EHR summarization identified in the literature and presents an array of methods for dealing with some of these challenges. We used hybrid data-driven and knowledge-based approaches to examine abundant redundancy in clinical narrative text, a data-driven approach to identify and mitigate biases in laboratory testing patterns with implications for using clinical data for research, and a probabilistic modeling approach to automatically summarize patient records and learn computational models of disease with heterogeneous data types. The dissertation also demonstrates two applications of the developed methods to important clinical questions: the questions of laboratory test overutilization and cohort selection from EHR data

Columbia University Academic Commons

Identifying and mitigating biases in EHR laboratory tests

Author: Albers David J.
Elhadad Noémie
Pivovarov Rimma
Sepulveda Jorge L.
Publication venue: Elsevier Inc.
Publication date: 01/10/2014
Field of study

AbstractElectronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time.We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts).Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases

Elsevier - Publisher Connector

PubMed Central

Genotator: A disease-agnostic tool for genetic annotation of disease

Author: DeLuca Todd F
Fusaro Vincent A
Jung Jae-Yoon
Pivovarov Rimma
Tonellato Peter J
Tong Mark
Wall Dennis P
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone. Methods We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at <url>http://genotator.hms.harvard.edu</url>. Results Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease. Conclusions As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.</p

Springer - Publisher Connector

Columbia University Academic Commons

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Temporal trends of hemoglobin A1c testing

Author: David J Albers
George Hripcsak
Jorge L Sepulveda
Noémie Elhadad
Rimma Pivovarov
Publication venue: 'BMJ'
Publication date
Field of study

Crossref

Data from: Survival Analysis with Electronic Health Record Data: Experiments with Chronic Kidney Disease

Author: David Albers (785167)
Dr. Herbert Chase (785180)
Dr. Yolanda Hagar (785178)
Miss Rimma Pivovarov (785179)
Prof. Neomie Elhadad (785182)
Prof. Vanja Dukic (785181)
Publication venue
Publication date
Field of study

This paper presents a detailed survival analysis for chronic kidney disease (CKD). The analysis is based on the EHR data comprising almost two decades of clinical observations collected at New York-Presbyterian, a large hospital in New York City with one of the oldest electronic health records in the United States. Our survival analysis approach centers around Bayesian multiresolution hazard modeling, with an objective to capture the changing hazard of CKD over time, adjusted for patient clinical covariates and kidney-related laboratory tests. Special attention is paid to statistical issues common to all EHR data, such as cohort definition, missing data and censoring, variable selection, and potential for joint survival and longitudinal modeling, all of which are discussed alone and within the EHR CKD context.</p

FigShare

Cloud computing for comparative genomics

Author: A Bateman
A Matsunaga
B Langmead
Dennis P Wall
DP Wall
DT Jones
J Dean
M Nei
MC Schatz
Parul Kudtarkar
Peter J Tonellato
Prasad Patil
R Chenna
Rimma Pivovarov
SF Altschul
TF Deluca
Vincent A Fusaro
Z Yang
Publication venue: BMC
Publication date: 01/05/2010
Field of study

Abstract Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.</p

Crossref

Springer - Publisher Connector

Harvard University - DASH

Directory of Open Access Journals

Identifying and mitigating biases in EHR laboratory tests

Author: Abdala
Albers
Albers
Albers
Albers
Banks
Birman-Deych
Chen
Chen
Cismondi
Cohen
David J. Albers
Farhangfar
Farzandipour
Grundy
Hersh
Hripcsak
Hripcsak
Jorge L. Sepulveda
Lasko
Lin
Little
Lussier
Lyon
McCarty
McPherson
Noémie Elhadad
Prokosch
Rimma Pivovarov
Rubin
Sagreiya
Saxena
Schafer
van Walraven
Warner
Weber
Wei
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Automated methods for the summarization of electronic health records

Author: Abraham
Abraham
Adler-Milstein
Afantenos
Alterman
Androutsopoulos
Arocha
Barzilay
Bashyam
Bui
Cao
Christensen
Cios
Cohen
Cohen
Combi
Cousins
Dagan
De Estrada
Delort
Edmundson
Enders
Erkan
Farri
Farri
Feblowitz
Friedman
Friedman
Fries
Hallett
Hallett
Harris
Hirsch
Hirschtick
Holden
Hripcsak
Hsu
Hug
Hunter
Janowicz
Jaspers
Jones
Jung
Klann
Klann
Klimov
Kushniruk
Laxmisan
Lin
Lindberg
Little
Liu
Luhn
Marcu
McDonald
McDonald
Militello
Mishra
Mortensen
Nenkova
Nenkova
Noy
Noémie Elhadad
O’Keefe
Patel
Patel
Pathak
Patwardhan
Payne
Pedersen
Perotte
Pesquita
Pivovarov
Pivovarov
Plaisant
Plaisant
Poh
Poh
Powsner
Powsner
Radev
Radev
Raghavan
Reichert
Rimma Pivovarov
Rind
Rogers
Rogers
Rogers
Roque
Rosenbloom
Samal
Savova
Schiff
Shahar
Sonnenberg
Stead
Styler
Suermondt
Sun
Tao
Thornton
Thyvalikakath
Unertl
Van der Meulen
Van Vleck
Van Vleck
Weber
Were
West
Wilcox
Wrenn
Wu
Zhang
Zhang
Zhang
Zhou
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref